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Abstract. For an NP n coNP function g of the Nisan-Wigderson type and a string b 
outside its range we consider a two player game on a common input a to the function. 
One player, a computationally limited Student, tries to find a bit of g(a) that differs from 
the corresponding bit of b. He can query a computationally unlimited Teacher for the 
witnesses of the values of constantly many bits of g(a). The Student computes the queries 
from a and from Teacher's answers to his previous queries. 

It was proved in [Krai lb] that if g is based on a hard bit of a one-way permutation then 
no Student computed by a polynomial size circuit can succeed on all a. In this paper we 
give a lower bound on the number of inputs a any such Student must fail on. Using that 
we show that there is a pseudo-finite set of hard instances on which all uniform students 
must fail. The hard-core set is defined in a non-standard model of true arithmetic and has 
applications in a forcing construction from [Krai la]. 



Introduction 

Consider a function g : {0, l} n — > {0, l} m defined as a Nisan-Wigderson generator based 
on some Boolean function /, cf. [NW94] . That is, there is a set system {Jj C [n]} ig r TO i such 
that 

• | Ji\ = £, for all i 

• | Jj fl Jj | < d, for all i ^ j 

and the i-th. bit of g(x) equals to f{x{Ji)) where / : {0, 1} £ — > {0, 1} and x{J{) is the i-hit 
string Xj 1 . . . Xj. if 

Ji = {jl <■■■ < 3i} ■ 

We are interested in the case when / is a hard bit B{v) of a polynomial-time computable 
one-way permutation h : {0, 1} — > {0, 1} : 

f(u) := B(h^(u)) . 
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If m > n there are strings b € {0, l} m \ Rng(g) and with any such string b we associate the 
following game mathcalGb- The game is played by two players, a Student and a Teacher, 
both knowing b. They receive a common input: any a £ {0, l} n . The Student tries to 
find i € [m] such that g(a)i ^ bi, certifying thus that b is outside of the range of g. The 
Student will be computationally limited and, in particular, while he will be able to compute 
permutation h he will not be able to compute function / and to find a suitable bit i himself. 
Instead he will compute some candidate solution i± and hand it to the Teacher. She has an 
unlimited computational power and will reply to the Student with the unique v 1 € {0, l} e 
such that h{v l ) = a(Jjj). 

If B^ 1 ) ^ b L1 the game stops with the Student solving the task. Otherwise he computes 
his next candidate solution i 2 £ [m] and hands it to the Teacher, gets back v 2 such that 
h(v 2 ) = a(J{ 2 ), etc. 

In general the Student will be allowed to present up to c candidate solutions, c some 
parameter. If he does not find a solution, we say he failed. The Student can be modelled 
by c functions 

Si{x), S 2 {x, y 1 ), S c (x, y 1 ,..., y c ~ l ) 
computing his candidate solutions: 

i k := S k (a,v x ,. . -,v k ~ 1 ), k<c 

where v J, s are the Teacher's replies. 

We are interested in the case when the Student is a small circuit, i.e. the functions S k 
are computed by circuits of a small total size. The Teacher's moves are uniquely determined. 
This Student-Teacher way of computing a function grew out of a use of Herbrand's theorem 
in bounded arithmetic; the Student-Teacher formalism was introduced in [KPS90] (see there 
also for an overview in a complexity-theoretic language). 

Assuming that h is indeed a one-way permutation it was proved in [Kr allb[ IKralla] 
that for any fixed c > 1 no P /poly Student can solve the task on all inputs a £ {0, l} n , for 
n >> 0. In this paper we are interested in the question whether there exists a set of hard 
instances H C {0, l} n such that every P /poly Student fails on most a € H. 

This question is motivated by a research in proof complexity; in particular by a conjec- 
ture!] that functions like g are hard proof complexity generators and by a model-theoretic 
approach to it based on forcing. This proof complexity motivation is discussed in detail in 
[KraTJal Part VIII] and in |Krallb| and we shall not review it here. 

In this paper we show that we can combine the lower bound argument from [Krai la, 
Chpt.31] and [Krai lb] with the model-theoretic set-up of forcing with random variables to 
give a partial affirmative answer to the question about the existence of sets H. The qualifi- 
cation " partial" means that we construct such a set of hard examples as a pseudo-finite set of 
strings of a non-standard length n in a model of true arithmetic and that the instances from 
it are hard for uniform students, i.e. the functions S\{x), £2(2:, y 1 ), • • • , S c (x, ?/,..., y c_1 ) 
defining their moves are computed by uniform algorithms. In fact, we can allow the uniform 

^Statement (S) in [Krallal Sec. 31. 4] or in |Krallb| . modifying Razborov's Conjecture 2 from [Raz??] and 
related also to Rudich's demi-bit conjecture from |Rud97] . 
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students to use common advice strings of all lengths. This partial solution is perfectly ad- 
equate for the purpose of the construction in [Krai la} Chpt.3ljl but it would be desirable 
to have such hard-core sets for non-uniform students as well. This is because the forc- 
ing models constructed in that case have some nice properties (in particular, witnessing of 
quantifiers [Krai la} Chpt.3] or saturation properties |Kra??| ) that may be useful for further 
research. 

The paper is self-contained in the sense that the reader will be able to fully understand 
the problem and its solution. However, to appreciate its relevance to the forcing construction 
and subsequently for the proof complexity conjecture alluded to above one needs to consult 
[Krallaj Part VIII] or [Krallbj (better both). 

The paper is organized as follows. In Section [1] we give details of the problem and use 
the argument from [Krallbl IKralla] to derive a lower bound on the number of inputs every 
P '/ 'poly student must fail on. In Section [2] some model theory and forcing with random 
variables is briefly reviewed and the pseudo-finite hard-core set is constructed in Section [3j 

Relevant background can be found in [Kra95^ [Krai la} IKrallb] , In particular, readers 
not familiar with non-standard models can find a self-contained introduction to the topic 
in the appendix in [Krallaj . 

1. Hardness of the game 

We shall fix the following parameters in the definition of function g: 

m := n + 1, I := n 1 ^ 3 , d := log(m) . 

In applications of Nisan-Wigderson generators m is usually exponentially large but for the 
purpose of proof complexity the best choice is to have m as small as possible. By |NW94] 
there is a set system {Ji}i with the required properties, and we fix any one of them. 

Let h be a polynomial time permutation (we are interested in its restriction to {0, 1} ) 
that is one-way and let B(v) be its hard bit. In particular, by this hardness assumption we 
mean that 

• For any fixed k > 1 no V/poly algorithm can compute from u G {0,1}^ the value 
B (h^ 1 * (u)) with the advantage better than t~ k over 1/2. 

Let us fix a string b G {0, l} m \ Rng(g). In [Krallbj (a similar argument is used 
in [Krallaj Sec. 31. 2] for a different purpose) it was proved that no P /poly Student can 
succeed in the game mathcalGb on all inputs, assuming that h is indeed one-way. We are 
now going to sketch the argument in order to extend it a bit and to deduce a lower bound 
on the number of inputs on which a Student must fail. Any details of the original argument 
missing here can be found in the proof of |Krallb[ Thm.3.2]. 

Consider a Student that attempts to succeed in the game mathcalGb on as many inputs 
a as possible and assume he can ask at most c queries to the Teacher. The lower bound 
will depend on c. 

Denote by W C {0, 1} U the set of all a G {0, l} n on which the Student succeeds. For 
a £ {0, l} n let us denote by v l (a) the preimage of a{J{) in h, for i G [m]. 



The original construction was based on an unsuitable sample space as pointed out by S. Buss. This is 
explained in the last section. 
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If the Student succeeded on a and i was its last query to the Teacher it means that 

f(a(Ji)) + h . 

Intuitively this gives us some information about the function / as we can deduce its value 
on the string a(Jj) while receiving during the computation from the Teacher only strings 
that have little to do with the string h^ l \a{Ji)) we would need in order to compute / 
ourselves. The formal argument makes this intuition precise. 

Assume that the Student asked k queries: the candidate solutions the Student produced 
were ii, . . . ,ik and the last one ik was correct. Call the fc-tuple (ii, . . . , ik) the trace of the 
computation on a. In particular, k < c. As the witnesses are unique the trace determines 
also the Teacher's replies. A simple counting argument yields (cf. the proof of |Krallb| , 
Thm.3.2]) 

Claim 1: There is a k-tuple (i\, . . . ,ik) £ [m] k for some k < c that is the trace of compu- 
tations on at least a fraction of °f a ^ inputs from W . 

Fixing a trace i = (i\, . . . ,ik) satisfying the claim, define for any u £ {0,1}^ and 
v £ {0, l} n ~ e the string a(u,v) £ {0, 1}™ as follows: put bits of u into the positions Jj fe and 
then fill the remaining n — i positions by bits of v. The following claim follows from the 
proof of Claim 1 by averaging. 

Claim 2: There is e £ {0, l} n ~^ s.t. at least a fraction of Tg^ys more u £ {0,1}^ yield 

sample a(u, e) £ W whose trace is exactly i than those u which yield a(u, e) £ W whose 
trace properly contains i. 

Fix one such an (n — £)-tuple e. Call any u £ {0, l} e good if a(u, e) £ W. 

The property that two distinct sets from the set system {Ji}i intersect in at most log(m) 
positions implies that there are, for any row i ^ ik, at most m assignments w to bits in Jj 
not set by e. Any such w determines, together with e, an assignment to variables in Jj and 
hence a string in {0, l} e ; denote it z w . Let Y\ be the set of all preimages of all z w in the 
permutation h. The total bit size of all Y{ together is m ^. 

This situation allows us to define an algorithm C that uses as advice the set system 
{Ji}i, the string b, the trace i, the partial assignment e, and all m — 1 sets Yj. The total 
size of the advice is again bounded above by . 

The algorithm C attempts to compute the function / on inputs u £ {0, 1}^. Let U be 
those inputs u £ {0,1}^ for which the trace of a(u,e)) either equals to i or starts with i, 
and let bo be the majority value of / on the complement of U. 

On input u £ {0, l} 1 C simulates the Student's computation on the string a := a(u, e) £ 
{0, 1}™. If any of the candidate solutions produced in the j-th query, j = 1, . . . , k, to the 
Teacher differs from the j-th entry in the trace i C halts and outputs b^. 

Otherwise, if the trace of the computation follows i, C uses sets Yi to simulate Teacher's 
replies (these are unique and can be tested as correct). If the computation evolved according 
to the trace i and reached the fc-th step C outputs 1 — bi k . 

The algorithm C outputs the bit bo in all cases except when the computation follows 
the trace i and reaches the k-th step. If the computation of the Student were to actually 
stop at that point then the value 1 — bi k is indeed equal to f(u). 
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Otherwise, if the computation were to continue, we do not have a way to deduce the 
true value of f(u). The influence of this case can be, however, bounded. By the choice of e 
in Claim 2 the former case happens for at least a fraction of more of all good inputs 

u € {0, 1} £ than the latter one. If W = {0, l} n (as in [Krai lb] ) we would be done: All u are 
good and as bo is the correct value of / for at least half of u ^ U, the algorithm C would 
compute / with an advantage over 1/2 at least j^xe- 

If W is a proper subset of {0, l} n this argument fails as we have no control over the 
number of bad u (i.e. for which a(u, e) £ W) but the trace of a(u, e) contains i, i.e. of the 
size of the set U \ W. However, if we knew that the size of the complement of W in {0, l} n 
is at most, say: 

sr := - ■ 2" 1/3 1 



2 (3m) c 

then the above argument works: s c bounds, in particular, the number of bad u and the 
algorithm C gets the advantage at least 



111 11 11 

> - • -r > 



(3m) k 2 (3m) c ~ 2 (3m) k ~ 2 (3m) c ' 

The algorithm C needs the same time as the Student except when it simulates a reply of 
the Teacher and looks for an appropriate witness in one of the sets Y{. This is done at most 
(c — l)-times and takes mP^ time per one witness-search. Hence if the Student is P/poly 
the total time C uses is c • m°W . 

We conclude that assuming that a P/poly Student fails on less than s c inputs contra- 
dicts the hypothesis that h is a one-way permutation. Hence the following statement was 
established. 

Lemma 1.1. Assume that the parameters n,m,£, the set system {Ji}i and the string b 
satisfy the conditions imposed on them earlier. Assume also that h is a polynomial-time 
one-way permutation, that B(v) is its hard bit and that f(u) = -B(/i( -1 )(m)). 

Let c > 1 be arbitrary. Then for any fixed k > 1, for any n large enough any Student 
asking at most c queries to the Teacher and computed by circuits of the total size m k must 
fail on at least 

s, := l- 2™ 1/3 1 



2 (3m) c 
inputs a € {0, l} n . 

Let us remark that if the hardness of the permutation h were exponential, in the sense 
that for some e > a circuit needs to have the size at least 2^ in order to compute the hard 
bit with an advantage at least 2~^ e , then we could allow also exponentially large Students 
posing up to n" 5 queries for some small 5 > (depending on e) and still get a meaningful 
lower bound. Also, the whole situation can be specialized to various circuit subclasses of 
P/poly such as AC or NC 1 , as discussed in [Krai lb] . 
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2. Some model theory 

In this section we briefly recall the set-up of forcing with random variables, enough to 
formulate and prove our result in the next section. However, we shall not go into the details 
specific for the construction in [Krallal Chpt.31] motivating this paper. 

Forcing of random variables is a method how to construct models of arithmetic. A 
special emphasis is given to bounded arithmetic because of its relation to proof complexity 
but the method is not limited to this theory. The models are formed from random variables 
on a pseudo-finite sample space and are Boolean- valued. 

Let mathcalM be a non-standard ^i-saturated model of true arithmetic in some lan- 
guage L containing the language of Peano arithmetic and having a canonical interpretation 
in the standard model. Let £1 G mathcalM be an infinite set; as it is an element of the 
model it is finite from the point of view of mathcalM . Let F C mathcalM be any - not 
necessarily definable - family of functions a : — > mathcalM . 

The family F will be the universe of a Boolean- valued L-structure K(F). The symbols 
of L are interpreted by composition with functions from F. For example, for a fc-ary function 
symbol / and any a.\, . . . , G F define the function f(a±, . . . , by 

/(«!,... ,a k )(ijj) := f(ai(uj),...,a k (uj)), for u G U . 

We need to assume that this function is also in F, i.e. that F is L-closed in the terminology 
of [Kralla] , 

Every atomic L(i ? )-sentence A is naturally assigned a subset ((^4)} C Q consisting of 
those samples wGD for which A is true in mathcalM . 

Combining the idea of Loeb's measure with some measure theory it was shown in 
[Krallal Sec. 1.2] that if we factor the Boolean algebra of mai/ica/M-definable subsets of f2 
by the ideal of sets of an infinitesimal counting measure we get a complete Boolean algebra 
mathcalB. 

The image of ((A)) in mathcalB in this quotient is denoted |j4J. Following Boole 
[Boo47| and Rasiowa-Sikorski [RS53J this determines the truth value |j4]] G mathcalB for 
any L(i ? )-sentence A: [[. . .J commutes with Boolean connectives and 

l3xA(x)j := \JlA(a)j 

and analogously for the universal quantifier. 

There are various generalizations of this basic set-up and, in particular, the random 
variables from the family F can be only partially defined on the sample space as long as 
their regions of undefinability have infinitesimal counting measures. 

In the particular construction in [Krallat Chpt.31] the sample space is simply the 
set {0, l} n for some non-standard n G mathcalM , and the family F consisted of partial 
random variables computed by students operating similarly as in the game mathcalGb- 
More precisely, any partial function a G F is computed by a P/poly Student who 

• gets an input oj £ {0, l} ra , 

• there is a standard parameter c such that the Student can ask the Teacher for the values 
of /i^ 1 ) on oj(Ji) for up to c values of i G [m] (c is common to all inputs oj but may differ 
for different a), 

• if the Teacher's answer v to any query about h^ l \uj{Ji)) does not satisfy B(v) = bi the 
computation is aborted and a is undefined, 
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• if the computation is not aborted after any Teacher's answer the Student outputs at the 
end an element of mathcalM (necessarily of size polynomial in n). 

Unfortunately such a function a is typically undefined on a (standard) positive fraction^ on 
{0, l} n and hence the pair {0, l} n ,F does not conform to the set-up of the construction. 

What is needed is an infinite subset H C fi, definable in mathcalM , such that every 
student-computed a is defined on all but an infinitesimal fraction of H. 

In the next section we shall construct such H for uniform students that may all use some 
common advice string. This is perfectly sufficient for the intended applications of the model. 
But it would still be desirable to have such a hard-core set for non-uniform students of some 
super polynomial size. The reason is that a family F based on such students can be modified 
into a compact one (in the sense of [Kralla[ Sec. 3. 4]) and the compactness of F implies 
that the resulting model K(F) has some nice model-theoretic properties mentioned in the 
introduction. For example, non-uniform students of sub-exponential size (i.e. the functions 
Si(x), S*2(x, y 1 ), . . . , S c (x, y , . . . , y c_1 ) defining their moves are computed by circuits of total 
size 2 n ° (1) ) define family F that is already compact. 

3. A CONSTRUCTION OF A HARD-CORE SET 

We assume that a non-standard n € mathcalM , a set system {Ji}i with the required 
properties, a permutation h and its hard bit B(v), and some string b £ {0, l} m \ Rng(g) 
are fixed. 

Let w 6 mathcalM be any string of size polynomial in n and let F^ m ^ C mathcalM be 
the family of partial random variables on {0, l} n defined as F in Section [2] but allowing all 
Students computing the random variables to use as an advice only the triple ({Ji}i,b, w). 
This is perfectly sufficient for any application^ of the eventual model in Sections 31.3. and 
31.4 of |Kralla| (w can contain e.g. a proof of a propositional formula or a witness of the 
membership of b in an NP set, etc.) and has the great advantage that the family F^ m ^ is 
now countable. 

Theorem 3.1. There exists an infinite set H C {0, l} n , H € mathcalM , such that each 
a E Fw is defined on all samples from H . 

Proof. Enumerate as ct\,a,2,... the set F^ m ^ in such a way that the Student defining ap. 
runs in time < m k and asks at most k queries, for all k > 1. 

By the ^-saturation there exists a sequence in mathcalM of a non-standard length 
t whose k-th element is a/-, for all standard k (see [Krallal p. 9]). We shall denote it 
suggestively (ai) i<t . 

If we take ai, . . . , we can compose the Students defining them by first computing a\, 
if it is not aborted then instead of outputting a value computing ot2, etc. , and outputting 
(arbitrary) values only at the end, if the computation is not aborted earlier. The resulting 
function is computed in time 0(km k ) using at most k(k + l)/2 < k 2 queries. Hence by 
Lemma ll.ll it is defined on at least samples from {0, l} n . This yields the following 

^Contrary to what was claimed in [Krallal L. 3 1.2.1] - I am indebted to S. Buss for pointing it out. See 
http://www.karlin.mff.cuni.cz/~krajicek/k2-upravy.html for an explanation and a correction. 

4 Note that [Krallbj already provided an alternative construction with the same consequences as those 
described in [Krallaj . 
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Claim: For each standard k > 1 there exists definable subset H k C {0, 1}™ of size at least 
Sfc2 such that all ol\, . . . , are defined on all samples from H k . 

By the Overspill the statement of the Claim holds also for the sequence (aj)i< r for some 
non-standard r <t, and we can take r small enough (but still non-standard) such that s r 2 
is non-standard and hence the set H := H r satisfies the statement of the theorem. 

□ 

Let us remark in conclusion that proofs of the Boolean case hard-core set theorem of 
Impagliazzo |Imp95| do not seem to work here. This is because n small Students cannot be 
combined into a one somewhat larger as this would blow-up the number of queries posed 
to the Teacher. 
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